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ABSTRACT 

Character recognition is quite difficult task. It is the process of distinguishing the input character as per their 
predefine character class. There are different researchers who work on English language in last few years. Which results in 
technology whose practical application is possible. But if we will talk about devnagari script it is seen that due to its 
complicated structure very less work is there. Devnagari is the third most language spoken in word, therefore this system 
is proposed to develop tech, which will give maximum accuracy in minimum time period with less cost. This system could 
be use in practical life for recognition of printed information present on documents like cheques, envelopes, forms, and 
other manuscripts has a variety of practical and commercial applications in banks, post offices, libraries, and publishing 
houses. Here we are using multiple feature instead of utilizing single feature we are using various features like GLCM, 
color dominant, Affine movement invariant and Histogram. 
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INTRODUCTION 

Character recognition is the process to classify the input character according to the predefine character class. 
With increasing the interest of computer applications, modern society needs the input text into computer readable form. 
This research is a simple approach to implement that dream as the initial step to convert the input text into computer 
readable form. Some research for hand written characters are already done by researchers with artificial neural networks. 
Rapidly growing computational power may enable the implementation of CR methodologies. Digital document processing 
is gaining popularity for application to office and library automation, bank and postal services, publishing houses and 
communication technology. English Character Recognition (CR) has been extensively studied in the last half century and 
progressed to a level, sufficient to produce technology driven applications. But same is not the case for Indian languages 
which are complicated in terms of structure and computations. Devanagari being the national language of India, spoken by 
more than 500 million people, should be given special attention so that document retrieval and analysis of rich ancient and 
modern Indian literature can be effectively done. 

SYSTEM DEVELOPMENT 

Due to various font size, writing style and similar shape of characters it is difficult to recognize the character. 
There for here this system is proposed to develop which will give maximum accuracy with in minimum time period. 
Which applications in the data present on paper documents has to be transferred into machine -readable format. Automatic 
recognition of printed and handwritten information present on documents like cheques, envelopes, forms, and other 
manuscripts has a variety of practical and commercial applications in banks, post offices, libraries, and publishing houses. 
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Here we are using multiple feature instead of utilizing single feature we are using various features like GLCM, color 
dominant, Affine movement invariant and Histogram. 

Proposed Character Recognition System 


Data base input 


Input image 


Pre-processing 





Feature extraction 
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Recognition 
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Figure 1: Block Diagram of Proposed Character Recognition System 


Block diagram given above represents the working flow of proposed system. In first block we will select image as 
input image for feature extraction to generate database. In next block preprocessing will be there on selected image 
i.e. conversion of color image to binary image, then filtering of converted image. Then next block consist of extraction of 
features like GLCM, Affine moment, Histogram. After extracting features they are stored in .mat file. Now again new 
input image is selected for character recognition purpose again preprocessing steps are repeated. Finally in recognition 
block characters are recognized, and recognized character is shown in last block i.e. o/p image. 


PERFORMANCE ANALYSIS 

Introduction 

The aim of this character recognition system is to develop the character recognition algorithm that achieves the 
high accuracy & high recognition speed. Also the graph of feature extraction, neural network training and recognition rate 
of character. 


Data Base 


I have selected characters of different font size and font face for generating database. The following figure shows 
the data base used. I have installed the kruti devnagri software in my system. Then characters are typed in word document 
and snap shot is taken. After cropping the required part image is saved by name new database. Figure 2(a) (b) (c) (d) (e) 
shows the image used to generate database. 


3?7 £T s/ e? S~ 

4 ^ £/=> -7/ S ct7 


&> & er 

< 3 “ rr ^ cr 

cr er 


3 

J J,®,' f 
'Yl JTT^e!t/^> + : 


3* 5t ^ g si f 


(a) 


(b) 


(c) 


Optimization of Optical Character Recognition for Printed Devanagari Text Using ANFIS Techniques 


33 


ST T^TT ^ cT FT 

cxi FT co cJ cfo 

er <3 

£T & cr ^ cr 


3T 3IT i? w u ^ u w »r ^ 


HT H» # ^ El TI 
IT^3i3 WirSH 
-CT 3ft ^ XTT of 


(d) 


(e) 

Figure 2: (a) (b) (c) (d) (e) Database Images 


Pre-Processing 

Converting Original Image to Gray Scale Image to Binary Image 


As shown in the following figure 3 (a), (b) (c) represents the pre-processing step. First of all the complete set of 
character is converted into gray image. 
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Figure 3: (a) Original Image (b) Gray scale Image (c) Binary Image 


Converting Gray Scale Image to Binary Image 

Then second step is converting gray image to binary image for that, MATLAB code is used which generates a 
Gray-decoded output vector or matrix y with the same dimensions as its input parameter x. After binarization image is 
converted in to O’ s and 1 ’s format. 


Filtering of Binary Image 


Now next step is removal Figure 4 (a) represent the noise free image. Median filtering is a nonlinear operation 
often used in image processing to reduce "salt and pepper" noise. A median filter is more effective than convolution when 
the goal is to simultaneously reduce noise and preserve edges. 
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Figure 4: (a) Binary Image to (b) Filtered Image 

Now it is not possible to show parameter of each character in report there for only few character are shown in 
diagram. Now each line is separated by using the line by line segmentation, from complete figure as shown in the 
following figure. Then each character is separately segmented by character by character segmentation process, from single 
line. And simultaneously, centroid of each character is find out which is denoted by red cross and box around the character 
is represented by blue colour. 
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Figure 5: Separation of Each Character Using Character by Character Segmentation 


Following MATLAB windows represents the samples taken to generate data base. The value of each character is 
stored in structure files and denoted by <lxlstruct>. The structure consist of Area occupied by character, Centroid, 
Bounding Box, Eccentricity, Orientation, Binary values of image, and perimeter. Each character have unique value of this 
perimeters. All this values are stored in findchardata file. Format of findchardata file is .mat file, Shown in figure 6. 
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Figure 6: Sample Taken to Generate Database 


Post-Processing 

Feature Extraction of Character 


Each <lxl> struct present in the table consist of values of co-efficient of binary image, Area, centroid, 
eccentricity, orientation, bounding box, perimeter of single character. It is not possible to represent feature of all character 
in this report therefore we will represent two characters with their details of parameters. If we will go step by step the first 

IT 

parameter for character ^ is area i.e. 187. Then come to second step the centroid of character is given by <1x2 doublex 


XT 

Table 1: Parameters of Character ^ 
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Table 2: Centroid of Character ^ 

E0 s si (43,1). Centro id <1x2 double 


H 

1 

2 

I 34.9412 

89.6417 


The bounding box of each character gives the boundary of character is given by <1x4 doublex The value of 
eccentricity is 0.8014 for this particular character. Orientation of this character is -63.3618. The value of perimeter is 
132.0416. 
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Table 3: Value of Bounding Box ^ 
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Following table shows, the binary values of character n it is < 37x17 logical > image, the graph plotted gives 
the graphical representation of this character. 
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GLCM of Character 

Now the following graph shows the GLCM. In GLCM the present texture of character is texture correlation as 
function of offset. The gray level co-occurrence matrix represents the values of ‘contrast’, ’correlation’, ’energy’, 
’homogeneity’ of each character which is stored in datafile. mat file. We took example of two characters there for, features 
of only two characters are shown here. 


Table 5: Value of Contrast of ^ 
l~H stats. Contrast <1x2 double> 
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Graph 2: Graph of Contrast of 
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Table 6: Value of Correlation of 
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Graph 3: Graph of Correlation of ^ 


Table 7: Value of Energy of ^ 
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Graph 3: Graph of Energy of ^ 
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Table 8: Value of Homogeneity of ^ 
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Graph 4: Graph of Homogeneity of ^ 
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Histogram of Character 

Now it is not possible to show histogram of each character in report there for only few character are shown in 
diagram as follows: 

E 

Graph 6: Histogram of ^ 

Input to Neural Network 

Feature extracted from the character are provided to the input of neural network. Here ANFIS code is used, in 
which membership function is used, and no. of epoch’s are decided. As shown in below table input value of neural network 
is in digit form and graphical representation of the numeric value is also shown below. 



Table 9: Input Values Provided to NN 




Graph 7: Input Values Provided to NN 


Output of NN 

Output of neural network can be represented by following graph. As well as recognized output is also shown in 
following figure (a), (b). 

Table 10: Output of NN 
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Graph 8: Output of NN 
Recognition Accuracy (Recognition Rate) 



Figure 7: Final Output i.e. 100% Recognised Character 

testing samples) xlOO. 
of number of correctly 


Recognition accuracy = (No. of correctly recognized speech samples/Total no. of 
For the best recognition system, Recognition accuracy should be high. It is calculated as ratio 
recognized speech samples upon total number of samples used in testing. 


Table 11: Character Recognition Test Results 
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Character Recognition Accuracy Graph 



Graph 9: Character Recognition Accuracy Graph 
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Comparison with Existing Systems 

The accuracy of proposed system is compared with the accuracy of existing system. 

Table 12: Comparison with Existing Systems 


Reference No. 

Feature 

Classifier 

Accuracy (%) 

[14] 

GSC 

Neural Network 

Recognition rate was 61.8%. 

[15] 

STRUCTURAL AND 
STATISTICAL 

Hausdorff image comparison 

Recognition rate was 66.78% 

[12] 

STATISTICAL 

Tree classifier and template matching 

Recognition rate was 83.67% 

[14] 

SFSA 

Stochastic finite state automation 

Recognition accuracy was 87% 

Proposed 

system 

STRUCTURAL AND 
STATISTICAL 

Artificial Neuro Fuzzy Interfacing 
System 

Recognition accuracy Average is 
92.66% 


Comparison of Character Recognition Accuracy among Different existing Methods: 



Graph 10: Graph Representing Comparison between Different Existing Methods 

The above graph 4.19 representing the different accuracy results of different methods. From this it is seen that 
proposed method gets the high accuracy 92.66% as compared to existing method. 

CONCLUSIONS 

The Character recognition is one of the difficult tasks, because verity font size and font faces are present now a 
day. So it’s a try to achieve maximum accuracy and reduce time duration required in recognition of character. 
The proposed method hopefully can inspire a new thinking and new way to tackle the face recognition problem. 
The performance of the proposed method in terms of recognition accuracy is obtained. Features used in character 
recognition i.e. GLCM, Colour dominant, Histogram, AFFINE moment invariant, gives good results compare to others, 
and for recognition process ANFIS (Artificial neuro fuzzy interference system) tech, is used which gives the best result 
compare to other technique. 

Talking about only two characters i.e. ^ and ^ it gives approximately 92% accuracy. But when talking about all 
Devanagari character it shows mistake in recognising some character. Recognition rate of all Devanagari character is near 
about 92.66%. 
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